Natural language processing for biology
نویسندگان
چکیده
A large part of the information required for biology research can only be found in free-text form, as in MEDLINE abstracts, or in comment elds of relevant reports, as in GenBank feature table annotations. Such information is important for many t ypes of analysis, such as classiication of proteins into functional groups, extraction of protein-protein interaction facts, discovery of new functional relationships, maintaining information of material and methods, increasing the precision and relevance of hits returned by information retrieval systems, and so on. However, information in free-text form or in comment elds is very diicult for use by automated system. For example, annotation of biological function of diierent proteins is a time-consuming process currently performed by h uman experts because genome analysis tools encounter great diiculty in performing this task. The ability to extract information directly from MEDLINE abstracts and other sources can directly help in such a task. Five papers were accepted under peer-review in this session. Previous work in automated understanding of biomedical papers tended to concentrate on analytical tasks such as identifying protein names. We are delighted that all ve accepted papers considered substantially less constrained problems that involved nding relationships and contexts. The paper by Baclawski et. al. describes a diagrammatic knowledge representation technique called keynets. The rich o n tology of the Uniied Medical Language System was used to construct and index keynets. Fully using the domain-independent and domain-speciic knowledge, keynets parses texts and resolve references to construct new relationships between entities. The paper by Humphreys et. al. describes two information extraction applications in bioinformatics based on templates. The rst application is EMPathIE, which is able to extract details of enzyme and metabolic pathways from journal articles. The second application is PASTA, which is able to extract information on the roles of amino acids and active sites in protein molecules from journal articles. They clariied how important template matching is in this eld, and applied the technique to terminology recognition. The paper by Rinddesch el. al. describes EDGAR, a natural language processing system that extracts relationships between cancer-related drugs and genes from biomedical literature. EDGAR draws on a combination of technologies: a stochastic part of speech tagger, an underspeciied syntactic parser, a rule-based system, as well as semantics information from the Uniied Medical Language System. The metathesaurus and the lexicon in the knowledge base are used to identify the structure of noun phrases in MEDLINE …
منابع مشابه
Natural Language Processing and Systems Biology
This chapter outlines the basic families of applications of natural language processing techniques to questions of interest to systems biologists and describes publicly available resources for such applications.
متن کاملUsing Generalized Language Model for Question Matching
Question and answering service is one of the popular services in the World Wide Web. The main goal of these services is to finding the best answer for user's input question as quick as possible. In order to achieve this aim, most of these use new techniques foe question matching. . We have a lot of question and answering services in Persian web, so it seems that developing a question matching m...
متن کاملGENIES: a natural-language processing system for the extraction of molecular pathways from journal articles
Systems that extract structured information from natural language passages have been highly successful in specialized domains. The time is opportune for developing analogous applications for molecular biology and genomics. We present a system, GENIES, that extracts and structures information about cellular pathways from the biological literature in accordance with a knowledge model that we deve...
متن کاملروش جدید متنکاوی برای استخراج اطلاعات زمینه کاربر بهمنظور بهبود رتبهبندی نتایج موتور جستجو
Today, the importance of text processing and its usages is well known among researchers and students. The amount of textual, documental materials increase day by day. So we need useful ways to save them and retrieve information from these materials. For example, search engines such as Google, Yahoo, Bing and etc. need to read so many web documents and retrieve the most similar ones to the user ...
متن کاملTowards Incorporating Scientific Literature into Biological Algorithms
The Tutorial This tutorial is a practical introduction on applying natural language processing (NLP) to biological research. It focuses on basic methodology used in current research efforts both in the published literature and in our lab. This tutorial exposes attendees to a broad range of work in the field, and at the end they will understand the technologies applied to solve NLP problems in b...
متن کاملNatural Language Processing for Bioinformatics: The Time is Ripe
Jeffrey T. Chang is a Ph.D. candidate in the Russ Altman lab in the Biomedical Informatics program at Stanford University. His work is focused on applying natural language processing techniques to biological problems ranging from pharmaco-genomics to sequence homology searches. Jeffrey has helped teach informatics classes at Stanford and has also taught a Python Programming Language tutorial at...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
دوره 12 شماره
صفحات -
تاریخ انتشار 2000